Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context.
translated by 谷歌翻译
水果苍蝇是果实产量最有害的昆虫物种之一。在AlertTrap中,使用不同的最先进的骨干功能提取器(如MobiLenetv1和MobileNetv2)的SSD架构的实现似乎是实时检测问题的潜在解决方案。SSD-MobileNetv1和SSD-MobileNetv2表现良好并导致AP至0.5分别为0.957和1.0。YOLOV4-TINY优于SSD家族,在AP@0.5中为1.0;但是,其吞吐量速度略微慢。
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
This study proposes an approach for establishing an optimal multihop ad-hoc network using multiple unmanned aerial vehicles (UAVs) to provide emergency communication in disaster areas. The approach includes two stages, one uses particle swarm optimization (PSO) to find optimal positions to deploy UAVs, and the other uses a behavior-based controller to navigate the UAVs to their assigned positions without colliding with obstacles in an unknown environment. Several constraints related to the UAVs' sensing and communication ranges have been imposed to ensure the applicability of the proposed approach in real-world scenarios. A number of simulation experiments with data loaded from real environments have been conducted. The results show that our proposed approach is not only successful in establishing multihop ad-hoc routes but also meets the requirements for real-time deployment of UAVs.
translated by 谷歌翻译
A large number of empirical studies on applying self-attention models in the domain of recommender systems are based on offline evaluation and metrics computed on standardized datasets, without insights on how these models perform in real life scenarios. Moreover, many of them do not consider information such as item and customer metadata, although deep-learning recommenders live up to their full potential only when numerous features of heterogeneous types are included. Also, typically recommendation models are designed to serve well only a single use case, which increases modeling complexity and maintenance costs, and may lead to inconsistent customer experience. In this work, we present a reusable Attention-based Fashion Recommendation Algorithm (AFRA), that utilizes various interaction types with different fashion entities such as items (e.g., shirt), outfits and influencers, and their heterogeneous features. Moreover, we leverage temporal and contextual information to address both short and long-term customer preferences. We show its effectiveness on outfit recommendation use cases, in particular: 1) personalized ranked feed; 2) outfit recommendations by style; 3) similar item recommendation and 4) in-session recommendations inspired by most recent customer actions. We present both offline and online experimental results demonstrating substantial improvements in customer retention and engagement.
translated by 谷歌翻译
Over the past years, fashion-related challenges have gained a lot of attention in the research community. Outfit generation and recommendation, i.e., the composition of a set of items of different types (e.g., tops, bottom, shoes, accessories) that go well together, are among the most challenging ones. That is because items have to be both compatible amongst each other and also personalized to match the taste of the customer. Recently there has been a plethora of work targeted at tackling these problems by adopting various techniques and algorithms from the machine learning literature. However, to date, there is no extensive comparison of the performance of the different algorithms for outfit generation and recommendation. In this paper, we close this gap by providing a broad evaluation and comparison of various algorithms, including both personalized and non-personalized approaches, using online, real-world user data from one of Europe's largest fashion stores. We present the adaptations we made to some of those models to make them suitable for personalized outfit generation. Moreover, we provide insights for models that have not yet been evaluated on this task, specifically, GPT, BERT and Seq-to-Seq LSTM.
translated by 谷歌翻译
In this paper, we present a robust and low complexity deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the scene of a remote sensing image. In particular, we firstly evaluate different low complexity and benchmark deep neural networks: MobileNetV1, MobileNetV2, NASNetMobile, and EfficientNetB0, which present the number of trainable parameters lower than 5 Million (M). After indicating best network architecture, we further improve the network performance by applying attention schemes to multiple feature maps extracted from middle layers of the network. To deal with the issue of increasing the model footprint as using attention schemes, we apply the quantization technique to satisfies the number trainable parameter of the model lower than 5 M. By conducting extensive experiments on the benchmark datasets NWPU-RESISC45, we achieve a robust and low-complexity model, which is very competitive to the state-of-the-art systems and potential for real-life applications on edge devices.
translated by 谷歌翻译
持久图(PDS)通常以同源性类别的死亡和出生为特征,以提供图形结构的拓扑表示,通常在机器学习任务中有用。先前的作品依靠单个图形签名来构建PD。在本文中,我们探讨了多尺度图标志家族的使用,以增强拓扑特征的鲁棒性。我们提出了一个深度学习体系结构来处理该集合的输入。基准图分类数据集上的实验表明,与使用图神经网络的最新方法相比,我们所提出的架构优于其他基于同源的方法,并实现其他基于同源的方法,并实现竞争性能。此外,我们的方法可以轻松地应用于大尺寸的输入图,因为它不会遭受有限的可伸缩性,这对于图内核方法可能是一个问题。
translated by 谷歌翻译
跟踪球员和团队运动中的球是分析表现或增强游戏体验的关键。当这些数据的唯一来源是广播视频时,需要运动场注册系统来估算同型并重新投影球或从图像空间到场地的球员。本文描述了在MMSPorts 2022 Camera Callibration Challenge的背景下,一个新的篮球法庭注册框架。该方法基于通过用透视感知约束采样的关键点的位置的编码器编码网络的估计。篮子位置的回归和重型数据增强技术使该模型稳健地对不同的领域。消融研究表明,我们的贡献对挑战测试集的积极影响。与挑战基线相比,我们的方法将平方误差除以4.7。
translated by 谷歌翻译
这项研究介绍了我们对越南语言和语音处理任务(VLSP)挑战2021的文本处理任务的医疗保健领域的自动越南图像字幕的方法作为编码器的体系结构和长期的短期内存(LSTM)作为解码器生成句子。这些模型在不同的数据集中表现出色。我们提出的模型还具有编码器和一个解码器,但是我们在编码器中使用了SWIN变压器,LSTM与解码器中的注意模块结合在一起。该研究介绍了我们在比赛期间使用的培训实验和技术。我们的模型在vietcap4h数据集上达到了0.293的BLEU4分数,并且该分数在私人排行榜上排名3 $^{rd} $。我们的代码可以在\ url {https://git.io/jddjm}上找到。
translated by 谷歌翻译